Field-scale crop water consumption estimates reveal potential water savings in California agriculture

Efficiently managing agricultural irrigation is vital for food security today and into the future under climate change. Yet, evaluating agriculture’s hydrological impacts and strategies to reduce them remains challenging due to a lack of field-scale data on crop water consumption. Here, we develop a method to fill this gap using remote sensing and machine learning, and leverage it to assess water saving strategies in California’s Central Valley. We find that switching to lower water intensity crops can reduce consumption by up to 93%, but this requires adopting uncommon crop types. Northern counties have substantially lower irrigation efficiencies than southern counties, suggesting another potential source of water savings. Other practices that do not alter land cover can save up to 11% of water consumption. These results reveal diverse approaches for achieving sustainable water use, emphasizing the potential of sub-field scale crop water consumption maps to guide water management in California and beyond.

Supplementary Fig. 5 Observed, fallow, and agricultural evapotranspiration (ET) by crop type in the Central Valley.Mean agricultural ET by crop type (blue fill and 95% CI) is the average difference between observed ET (black outline) and naturally-occurring ET (cream fill).While we find significant differences in mean agricultural ET across crop types, the gray box plots also show a broad spread in agricultural ET within crop types (box plot shows 0.5, 0.25.0.5, 0.75, and 0.95 quantiles).

Supplementary Note 1: Empirical vs. modeled agricultural ET
In the absence of empirical agricultural evapotranspiration (ET) estimates, modeling theoretical crop water demand can be a useful tool for water management.One popular model developed specifically for California is CalSIMETAW, a model that estimates crop-specific water demand at the county level using a combination of crop coefficients which are a measure of crop-specific water intensity, weather data, and several other variables [1].
County-level, crop-specific water demand estimates calculated using CalSIMETAW are available for years 2000-2015.To retrieve modeled estimates of agricultural ET from CalSIMETAW's water demand estimates, we try removing naturally-occurring ET in two different ways: (1) by subtracting our empirical estimates of naturally-occurring ET or (2) by subtracting precipitation as reported by CalSIMETAW.In line with how agricultural ET is generally calculated using modeling, we calculate agricultural ET at a monthly time step and set it to zero if it is negative before aggregating to a yearly estimate.To compare these to our estimates of agricultural ET, we aggregate our empirical agricultural ET estimates to corresponding crop categories at the county level, also computed at the monthly level using the protocol just described.
We find a correlation of 0.74 between empirical and modeled county-level, crop-specific agricultural ET when using our naturally-occurring ET estimates as the counterfactual for the modeled estimates (Supplementary Fig. 6A).Modeled agricultural ET has a positive bias of 14.1 cm per year (12.4-15.895% CI) as compared to the empirical measurements.Using precipitation as the counterfactual deteriorates the relationship between empirical and modeled agricultural ET: the correlation drops to 0.69 and the bias increases to 34.2 cm per year (32.1-36.395% CI) (Supplementary Fig. 6B).
We note the discrepancy in years used to calculate the modeled (2000-2015) versus empirical (2016,2018,2019) agricultural ET estimates.However, we believe that the biases found are not a result of this difference: if the climate is different between these sets of years, the more recent years are those that have been most hot and dry [2].As a result, one could expect that the bias to be even larger if the same years were used for the analysis.

Supplementary Note 2: The effect of orchard age on agricultural ET
Water consumption is known to be affected by orchard age, especially for young orchards [3].One advantage of data-driven, field-scale estimates of agricultural ET is that they can capture such variation.
Since information about orchard age is not currently available in California, we use the LandIQ dataset's transitions from "young perennial" classifications to crop-specific orchard categories as a proxy for orchard age."Young perennials" refer to non-fruit-bearing young trees, so their change to fruit-bearing orchard classifications helps us estimate the age of the orchard.LandIQ crop type data are available for the years 2014, 2016, 2018, 2019, and 2020.Since we have agricultural ET data for the years 2016, 2018, and 2019, we can date orchards that have been bearing fruit for up to five years (i.e.these would be orchards for which we have agricultural ET data in 2019, and were classified as young perennials for the last time in 2014.)Using this technique, we uncover a clear relationship between orchard age and agricultural ET (Supplementary Fig. 7).In the absence of empirical irrigation efficiency measures, irrigation efficiency is often approximated [4,5] using technology-specific irrigation efficiency estimates drawn from reports [6][7][8].
Specifically, irrigation efficiency is calculated as the product conveyance efficiency, management efficiency, and application efficiency.Conveyance efficiency captures losses bringing water from its source to the farm, management efficiency describes losses on the farm, and application efficiency captures losses during or after application.We calculate county-specific irrigation efficiency by pulling efficiency numbers from the aforementioned reports.We set conveyance efficiency to .85,management efficiency to .95, and application efficiency to .60 for flood irrigation, .75 for sprinkler irrigation, and .95 for drip irrigation.The frequency of adoption of different irrigation technologies in each county is retrieved from the same USGS dataset we obtain reports of water diverted for irrigation from.
Our empirical estimates of irrigation efficiency are not significantly different to what theoretical estimates would predict, though theoretical estimates tend to underestimate efficiency in southern counties and overestimate it in northern counties (Supplementary Fig. 8).It is possible that theoretical numbers underestimate the irrigation efficiency in the south because they only capture improvements in irrigation technology, and not any other practices that might increase efficiency.For example, deficit irrigation can improve irrigation efficiency [9], as well as careful irrigation scheduling [10] and precision agriculture [11].Empirical estimates of irrigation efficiency may therefore capture the effect of additional drivers of irrigation efficiency past the technology being used.

Supplementary Note 4: The effect of OpenET and machine learning model error on agricultural ET estimates
We define agricultural ET as the difference between total ET and naturally-occurring ET, ET ag = ET tot − ET nat (Main text eq.1), where naturally-occurring ET is the counterfactual ET that would occur naturally, were the same land fallow.However, we calculate agricultural ET using estimates of total ET from OpenET and modeled naturally-occurring ET trained on OpenET data (Supplementary eq. 1).Since the OpenET data and the model we use to estimate naturally-occurring ET have errors associated with them, our estimates of agricultural ET can be described as the true agricultural ET plus several error terms (Supplementary eq.4), which we derive as follows: where where ÊT nat is the (naturally-occurring) ET that OpenET would observe if the field were fallow, and ϵ M L is the error from the ML model in predicting ÊT nat .
We can then again reorganize the same equation by separating the error from the OpenET observations.This reveals the relationship between the true ET ag and our estimate, ET

E[ET
We show in Supplementary Figures 1-4 that ϵ ML is unbiased, that is, E[ϵ ML ] = 0. OpenET has also been shown to produce unbiased estimates over agricultural lands, so E[ϵ OpenET,ag ] = 0 (see Table 3 on page 6 of the OpenET Intercomparison and Accuracy report) [12].We can also assume ϵ OpenET,fal is unbiased because though OpenET has not specifically been evaluated over fallow fields, it produces unbiased estimates over natural shrublands and grasslands with ET values of similar magnitude to fallow fields in the Central Valley [12].As a result, in expectation our estimates of agricultural ET are unbiased, and consequently produce unbiased regression coefficients.

E[ET
While our estimates of agricultural ET are unbiased in expectation, the error terms do add variance that is not present in the true agricultural ET.Therefore, any analysis that assesses variation across pixels will reflect inflated variance relative to true agricultural ET.For example, this has implications for the farming practices and fallowing scenarios we conduct (Fig. 3).However, here we estimate that error is responsible for only 11% of the variance in our annual estimates of agricultural ET, suggesting its influence over key results is limited.Specifically, if we assume that all error terms are independent, we have: var(ET * ag ) = var(ET ag ) + var(ϵ OpenET,ag ) + var(ϵ OpenET,fal ) + var(ϵ ML ) (7) where var(•) denotes the variance.
Then, because variance is equal to mean squared error (MSE) when bias is 0, we have: Average over all years Supplementary Fig. 9 Uncertainty in irrigation efficiency estimates.To calculate irrigation efficiency, we divide agricultural evapotranspiration (ET) by irrigation amount, but the available data do not overlap in years over which they are available.Counties are ordered by latitude, with more northern counties on the left.The average irrigation efficiency is shown as a red x, but we also assess the effect of using different years in the numerator and denominator by combining all different permutations of agricultural ET (shape) and irrigation (color) data.We note that 2016 and 2015 were drought years, while 2018, 2019, and 2010 were wetter.We do find substantial spreads in irrigation efficiency driven by the year of irrigation data, especially in some southern counties.However, most of the spread is driven from combining wet and dry years together, suggesting that an average may successfully decrease much of the variability.
switching scenario yields 11.7% savings (95.3% if switching to the minimum-consuming crop), farming practices lead to 13.5% savings, and fallowing 5% of land reduces agricultural ET by 10.0% (Supplementary Fig. 11).All trends are the same as in the main text.Supplementary Fig. 11 The percent reduction in agricultural evapotranspiration (ET) driven by various management scenarios without cleaning training data.This is the equivalent of Fig. 3 in the main text, but calculated using a naturally-occurring ET model trained on uncleaned data.
lower, at 54.6% (46.1, 63.2) on average.However, we still find significantly lower irrigation efficiency in the northern counties (Supplementary Fig. 12).Supplementary Fig. 14 The percent reduction in agricultural evapotranspiration (ET) driven by various management scenarios using training data also marked fallow by the Cropland Data Layer.This is the equivalent of Fig. 3 in the main text, but calculated using a naturally-occurring ET model trained on data also marked fallow by the Cropland Data Layer.

6
Simulated vs. empirical agricultural evapotranspiration (ET).Simulated agricultural ET is calculated using theoretical crop water demand from the CalSIMETAW model and by subtracting either (A) naturally-occurring ET estimated using machine learning or (B) precipitation.Each point represents the agricultural ET for a crop in a specific county in the California Central Valley.

Supplementary Fig. 7 . 8
Water consumption by life stage of orchards in California's Central Valley.Prior to bearing fruit, orchards have low agricultural water consumption, or evapotranspiration (ET).Water consumption then increases rapidly during the first 5 years of fruit production, after which it stabilizes.Supplementary Note 3: Empirical vs. theoretical irrigation efficiency Empirical vs. theoretical irrigation efficiency across the counties of the California Central Valley.Empirical irrigation efficiency is calculated using empirical estimates of agricultural ET divided by reports of total water diverted for irrigation.Theoretical irrigation efficiency is calculated based on efficiency estimates and reported irrigation technology used in each county.Theoretical irrigation efficiency underestimates empirical estimates for more southern counties and overestimates it for more northern counties.
ET * ag denotes our estimate of agricultural ET, ÊT tot is OpenET's estimate of ET tot and ẼT nat is our estiate of the naturally-occurring ET, predicted by our machine learning (ML) model.The hat notation indicates estimates inclusive of OpenET error, while the tilde indicates estimates inclusive of ML model error.
Because our naturally-occurring ET ML model is trained on and therefore predicts OpenET values over fallow lands, we can rewrite the same expression as: var(ET * ag ) = var(ET ag ) + MSE(ϵ OpenET,ag ) + MSE(ϵ OpenET,fal ) + MSE(ϵ ML ) (8) We calculate the MSE of our machine learning model using our test set, and OpenET provides root MSE values over croplands and shrublands (which we use to approximate error over fallow lands) (again, see Table 3 on page 6 of the OpenET Intercomparison and Accuracy report) [12].We find that for estimates of yearly ET * ag , MSE(ϵ OpenET,ag ) + MSE(ϵ OpenET,fal ) + MSE(ϵ ML ) represents only 11% of var(ET * ag ).This indicates that the vast majority of the variance we uncover is representative of true differences in water consumption across fields, as opposed to variability due to model errors.